Pengantar Pemrograman Triton: Berpindah dari Thread ke Instance Program

Dalam Triton, satuan dasar eksekusi berpindah dari thread skalar CUDA ke Instance Program. Ini merupakan abstraksi dari blok thread GPU, di mana satu instance menangani blok vektorisasi elemen secara bersamaan.

1. Identitas Instance Program

Setiap unit eksekusi mendapatkan identitasnya melalui pid = tl.program_id(axis=0). Bayangkan sebuah Forklift Gudang (Instance Program) mengambil sebuah Palet (blok) berisi 128 kotak, dibandingkan dengan satu pekerja (thread CUDA) yang mengambil satu kotak.

2. Triton vs. Tensor PyTorch

Memahami kesenjangan semantik sangat penting untuk manajemen memori:

Tensor PyTorch: Objek Python sisi host yang membungkus penyimpanan VRAM, stride, dan metadata.
Tensor Triton: Objek tingkat kompiler yang mewakili nilai atau pointer yang terletak di register atau SRAM.

Tampilan PyTorch
Objek Python yang menunjuk ke memori global yang kontinu.

Tampilan Triton
Blok data 2D/1D di dalam register kompiler.

3. Sifat SPMD

Triton mengikuti pendekatan Program Tunggal, Data Banyak (SPMD) alur. Setiap instance program menjalankan persis sama kode. Perbedaan hanya terjadi ketika logika menggunakan pid untuk menghitung offset memori tertentu.

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

What is the primary identifier for a Triton execution unit?

threadIdx.x

tl.program_id(axis=0)

tl.block_idx()

torch.get_id()

QUESTION 2

True or False: A Triton tensor is a Python object that stores metadata like strides on the host CPU.

True

False

QUESTION 3

What is the result of 'forgetting that all program instances execute the same kernel body'?

The compiler will automatically distribute tasks.

Race conditions or overwriting memory if pid-based logic is missing.

The kernel will fail to compile due to a syntax error.

Execution time will double.

QUESTION 4

In the forklift analogy, what does the 'Aisle Number' represent?

The BLOCK_SIZE

The program_id (pid)

The GPU Driver version

The Pointer address

QUESTION 5

Why is the Triton model considered 'Vectorized' compared to CUDA?

It uses Python lists.

One Program Instance handles a block of elements, not just one scalar element.

It only works with 2D matrices.

It runs on the CPU's SIMD units.